A Provenance Model for Manually Curated Data

نویسندگان

  • Peter Buneman
  • Adriane Chapman
  • James Cheney
  • Stijn Vansummeren
چکیده

Many curated databases are constructed by scientists integrating various existing data sources “by hand”, that is, by manually entering or copying data from other sources. Capturing provenance in such an environment is a challenging problem, requiring a good model of the process of curation. Existing models of provenance focus on queries/views in databases or computations on the Grid, not updates of databases or Web sites. In this paper we motivate and present a simple model of provenance for manually curated databases and discuss ongoing and fu-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Provenance in Manually Curated Databases

Many curated databases are constructed by scientists integrating various existing data sources. Most current approaches to provenance in databases are based on views and fail to take account of the added value of the work done by scientists in manually creating and modifying data. Capturing provenance in such an environment is a challenging problem, requiring changes in practice, changes to exi...

متن کامل

A Copy-and-Paste Model for Provenance in Curated Databases

Provenance is information describing the origin, construction, location, ownership, or other aspects of the history of an object. Previous work on provenance has concentrated on an understanding of how provenance is described when the data of interest has been derived by queries from other data sources, as is the case in data warehouses. In this paper we focus on another important class of data...

متن کامل

Improv: Flexible Data Provenance for Relational Databases

Curated databases, which consist of data extracted from original sources, printed articles, and other databases, are a valuable source of data for scientists. However, as curated databases aggregate information from multiple sources, the origin of the data elements can be lost. Because of this, curated databases often provide support for data annotations, which are pieces of extra information a...

متن کامل

Publishing DisGeNET as nanopublications

The increasing and unprecedented publication rate in the biomedical field is a major bottleneck for knowledge discovery in the Life Sciences. The manual curation of facts from published scientific papers is slow and inefficient, and therefore new approaches are needed that can enable the automatic, scalable and reliable extraction of assertions. While the publication of scientific assertions an...

متن کامل

Leveraging the Open Provenance Model as a Multi-tier Model for Global Climate Research

Abstract— Global climate researchers rely upon many forms of sensor data and analytical methods to help profile subtle changes in climate conditions. The U.S. Department of Energy’s Atmospheric Radiation Measurement (ARM) program provides researchers with a collection of curated Value Added Products (VAPs) resulting from continuous sensor data streams, data fusion, and modeling. The ARM operati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006